A Behavior Based Kernel for Policy Search via Bayesian Optimization

نویسندگان

Aaron Wilson

Alan Fern

Prasad Tadepalli

چکیده

We expand on past successes applying Bayesian Optimization (BO) to the Reinforcement Learning (RL) problem. BO is a general method of searching for the maximum of an unknown objective function. The BO method explicitly aims to reduce the number of samples needed to identify the optimal solution by exploiting a probabilistic model of the objective function. Much work in BO has focused on Gaussian Process (GP) models of the objective. The performance of these models relies on the design of the kernel function relating points in the solution space. Unfortunately, previous approaches adapting ideas from BO to the RL setting have focused on simple kernels that are not well justified in the RL context. We show that a new kernel can be motivated by examining an upper bound on the absolute difference in expected return between policies. The resulting kernel explicitly compares the behaviors of policies in terms of the trajectory probability densities. We incorporate the behavior based kernel into a BO algorithm for policy search. Results reported on four standard benchmark domains show that our algorithm significantly outperform alternative state-of-the-art algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample Efficient Bayesian Optimization for Policy Search: Case Studies in Robotics and Education

In this work we investigate the problem of learning adaptive strategies, called policies, in domains where evaluating different policies is costly. We formalize the problem as direct policy search: searching the space of policy parameters to identify policies that perform well with respect to a given objective. Bayesian Optimization is one method suitable for such settings, when sample/data eff...

متن کامل

Bayesian optimization for automated model selection

Despite the success of kernel-based nonparametric methods, kernel selection still requires considerable expertise, and is often described as a “black art.” We present a sophisticated method for automatically searching for an appropriate kernel from an infinite space of potential choices. Previous efforts in this direction have focused on traversing a kernel grammar, only examining the data via ...

متن کامل

Active Contextual Entropy Search

Contextual policy search allows adapting robotic movement primitives to different situations. For instance, a locomotion primitive might be adapted to different terrain inclinations or desired walking speeds. Such an adaptation is often achievable by modifying a small number of hyperparameters. However, learning, when performed on real robotic systems, is typically restricted to a small number ...

متن کامل

Factored Contextual Policy Search with Bayesian Optimization

Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different "contexts". Bayesian optimization approaches to contextual policy search (CPS) offer data-efficient policy learning that generalize over a context space. We propose to improve dataefficiency by factoring typically considered contexts into two compon...

متن کامل

Using trajectory data to improve bayesian optimization for reinforcement learning

Recently, Bayesian Optimization (BO) has been used to successfully optimize parametric policies in several challenging Reinforcement Learning (RL) applications. BO is attractive for this problem because it exploits Bayesian prior information about the expected return and exploits this knowledge to select new policies to execute. Effectively, the BO framework for policy search addresses the expl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A Behavior Based Kernel for Policy Search via Bayesian Optimization

نویسندگان

چکیده

منابع مشابه

Sample Efficient Bayesian Optimization for Policy Search: Case Studies in Robotics and Education

Bayesian optimization for automated model selection

Active Contextual Entropy Search

Factored Contextual Policy Search with Bayesian Optimization

Using trajectory data to improve bayesian optimization for reinforcement learning

عنوان ژورنال:

اشتراک گذاری